Search CORE

33 research outputs found

Neural Architecture Search as Program Transformation Exploration

Author: Crowley Elliot J
O'Boyle Michael F P
Turner Jack
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/02/2021
Field of study

Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3

\times

in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time. Code is available at~\href{https://github.com/jack-willturner/nas-as-program-transformation-exploration}{this https url}

arXiv.org e-Print Archive

Edinburgh Research Explorer

mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

Author: Brauckmann Alexander
Grosser Tobias
O'Boyle Michael F P
Polgreen Elizabeth
Publication venue
Publication date: 27/12/2023
Field of study

MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR’s high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth – a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness by raising C programs to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU

Edinburgh Research Explorer

mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis

Author: Brauckmann Alexander
Grosser Tobias
O'Boyle Michael F. P.
Polgreen Elizabeth
Publication venue
Publication date: 06/10/2023
Field of study

MLIR is an emerging compiler infrastructure for modern hardware, but existing programs cannot take advantage of MLIR's high-performance compilation if they are described in lower-level general purpose languages. Consequently, to avoid programs needing to be rewritten manually, this has led to efforts to automatically raise lower-level to higher-level dialects in MLIR. However, current methods rely on manually-defined raising rules, which limit their applicability and make them challenging to maintain as MLIR dialects evolve. We present mlirSynth -- a novel approach which translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. Instead, it uses available dialect definitions to construct a program space and searches it effectively using type constraints and equivalences. We demonstrate its effectiveness \revi{by raising C programs} to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows. On Polybench, we show a greater coverage than previous approaches, resulting in geomean speedups of 2.5x (Intel) and 3.4x (AMD) over state-of-the-art compilation flows for the C programming language. mlirSynth also enables retargetability to domain-specific accelerators, resulting in a geomean speedup of 21.6x on a TPU

arXiv.org e-Print Archive

Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing

Author: Margiolas Christos
O'Boyle Michael F. P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Edinburgh Research Explorer

Bind the Gap: Compiling Real Software to Hardware FFT Accelerators

Author: Ainsworth Sam
Armengol Estapé Jordi
O'Boyle Michael F P
Woodruff Jackson
Publication venue
Publication date: 09/06/2022
Field of study

Edinburgh Research Explorer

Expert Programmer versus Parallelizing Compiler: A Comparative Study of Two Approaches for Distributed Shared Memory

Author: Bull J. Mark
O'Boyle Michael F. P.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1996
Field of study

This article critically examines current parallel programming practice and optimizing compiler development. The general strategies employed by compiler and programmer to optimize a Fortran program are described, and then illustrated for a specific case by applying them to a well-known scientific program, TRED2, using the KSR-1 as the target architecture. Extensive measurement is applied to the resulting versions of the program, which are compared with a version produced by a commercial optimizing compiler, KAP. The compiler strategy significantly outperforms KAP and does not fall far short of the performance achieved by the programmer. Following the experimental section each approach is critiqued by the other. Perceived flaws, advantages, and common ground are outlined, with an eye to improving both schemes

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Author: Cole Murray
Dreslinski Ronald
Kaszyk Kuba
O'Boyle Michael F P
Pal Subhankar
Publication venue
Publication date: 12/08/2020
Field of study

Edinburgh Research Explorer

C2TACO: Lifting Tensor Code to TACO

Author: De Souza Magalhães José Wesley
O'Boyle Michael F P
Polgreen Elizabeth
Woodruff Jackson
Publication venue
Publication date: 22/10/2023
Field of study

Edinburgh Research Explorer

Rewriting History: Repurposing Domain-Specific CGRAs

Author: Ainsworth Sam
Brauckmann Alexander
Cummins Chris
Koehler Thomas
O'Boyle Michael F. P.
Woodruff Jackson
Publication venue
Publication date: 16/09/2023
Field of study

Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations they do not natively support. FlexC uses dataflow rewriting, replacing unsupported regions of code with equivalent operations that are supported by the CGRA. We use equality saturation, a technique enabling efficient exploration of a large space of rewrite rules, to effectively search through the program-space for supported programs. We applied FlexC to over 2,000 loop kernels, compiling to four different research CGRAs and 300 generated CGRAs and demonstrate a 2.2

\times

increase in the number of loop kernels accelerated leading to 3

\times

speedup compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the accelerator

arXiv.org e-Print Archive